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Abstract 



The Opacity Project (OP) and Iron Project (IP) are pioneering international collaborations 
which have been computing, for more than 25 years, massive atomic data sets for astrophysical 
applications. We review the data activities that have been carried out, namely curation, analysis 
and preservation, and the development of databases and computer tools for data dissemination 
and end-user processing. New opportunities within the current data-intensive boom referred 
to as e-science are described, in particular the Virtual Atomic and Molecular Data Center 
(VAMDC) that has been recently launched to consolidate and promote atomic and molecu- 
lar database services. Key words: atomic data; Opacity Project; Iron Project; laboratory 
astrophysics; databases; e-science; virtual data centers. 

Introduction 

It is an honor to join the celebrations of the 50th Anniversary of the Instituto Venezolano de 
Investigaciones Cientificas (IVIC) by contributing with a review of the data activities of the 
Opacity Project (OP) |137[ I138j and Iron Project (IP) [71]. Since the beginning of the 80s 
(OP) and through the 90s (IP), these international consortia have been dedicated to the com- 
putation of massive atomic data sets for astrophysical applications. In reference to the title, by 
data activities we mean compilations, databases and data curation, analysis, dissemination and 
preservation. They have been supported by the Centre de Donnees astronomiques de Stras- 
bourg (CDdll), France, the Ohio Supercomputer Center (oscEl), USA, the Centro Cientifico de 
IBM de Venezuela, the Centro Nacional de Calculo Cientifico de la Universidad de Los Andes 
(CeCalCULAl) and the Centro de Ffsicc0 of IVIC. 

It is also timely to pay our respects to Mike Seatoi]@, the leader of the OP and a central 
reference to the IP, who passed away in May 2007. As can be appreciated from the long lists of 
publications, both the 0F§ and IFO have been widely reviewed, most recently in the meeting 
"Atoms and Astrophysics: Mike Seaton's Legacy" held at University College London, UK, on 
14-15 April 2008, to reminisce over his monumental scientific contributions. A collection of 
the papers presented in this event, edited by Pete Storey and Phil Burke (see Ref. [131J and 
references therein), acclaims the OP to be among his major lifetime projects. What perhaps 
have been overlooked thus far were his data management skills, views and endeavors which 
were crucial in the OP and a source of encouragement to other atomic data ventures since 
then. With him in mind, we will go over the numerical methods and approximations used 
in the OP and IP, highlight some of the main findings, particularly those resulting from data 
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analyzes, and describe the atomic databases and applications TOPbaseEl, TIPbasfl OPserveirj 
and CHIANT0 associated with these pioneering long-term collaborations. 

If data activities were nascent and poorly funded in the 80s and 90s, they are currently 
undergoing a tremendous boom as a new way of doing research, e-science [TOl [TH [59] , based on 
global data-mining collaborations on a second generation Internet. Due to its foreseeable po- 
tential, data-intensive computing is being put forward as the "fourth paradigm" in science [HS] 
which intends to unify the other three: experiment, theory and computer simulation. It will cer- 
tainly give rise to new opportunities for interdisciplinary cooperation and inter-organizational 
data sharing where, in the same way that scientific papers are now globally available on the 
Internet, so will be the voluminous distributed data sets that are used to generate them. There- 
fore, some of the discussions that are presented here regarding data management perspectives 
are of general interest to other research fields beyond those of computational atomic physics 
and laboratory astrophysics. 

In this challenging context, the Virtual Atomic and Molecular Data Center (VAMDC0) 
was launched in July 2009 to integrate and boost atomic and molecular (A&M) databases due 
to their relevance in a variety of scientific and technological fields. It has been agreed to house 
the OP and IP data products and its related data-intensive applications within the VAMDC 
and to adopt its general data-preservation policies, an issue of high priority since both the OP 
and IP are now long past their prime time. In this respect, we will go over recent innovations 
such as metadata, XML schema and distributed data reservoirs, and discuss the perspectives 
of the OP/IP data services within the VAMDC and in the new cyber-infrastructure. 



The Opacity Project 

The Opacity Project (OP) |137l I138j was launched as the result of a plea by Norman Simon 
[129] for a revision of the metal (elements heavier than He) opacities, and was first discussed by 
David Hummer, Dimitri Mihalas and Mike Seaton in the living room of the former in Boulder, 
Colorado, USA, in the summer of 1982. Simon had shown earlier that an increase of a factor 
of 2 to 3 in the metal opacities would eliminate the long-standing "Cepheid mass discrepancy" . 
Cepheids are pulsating variable stars which are used, due to the precise relationship between 
their variable luminosity and pulsation period, to measure extragalactic distances. However, 
the Cepheid masses determined from evolutionary tracks and pulsation properties showed dis- 
satisfying differences [88]. The discussion in Boulder dealt with the details of a new equation 
of state [75] , which would be based on the "chemical picture" where atoms keep their identities 
and the plasma effects are represented by an occupation probability formalism, and on the 
feasibility of including electron correlation effects in the computation of the massive radiative 
data needed to estimate opacities. 

''http : //cdsweb . u-strasbg . f r/topbase/topbase . html 
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Also as a result of Simon's request, a parallel project was launched at the Lawrence Liv- 
ermore National Laboratory, CA, USA, which came to be known as OPAL [77]. By using 
the method of detailed configuration accounting, where the atomic data are computed with the 
Dirac equation and parametric potentials, and an equation of state |112j based on the "physical 
picture" by a renormalized activity expansion of the grand canonical ensemble, OPAL removed 
some of the inherent approximations in the widely used Los Alamos Astrophysical Opacity 
Library [73] . 

The OP consortium quickly grew to involve about 30 international collaborator^ from 
France, Germany, UK, USA and Venezuela, and would meet regularly on a six-monthly basis 
for at least the next ten years. In a way, the OP established a milestone in the field of com- 
putational laboratory astrophysics regarding participation in long-term international scientific 
collaborations, publication of results (e.g. the series of paper^ "The equation of state for stel- 
lar envelopes" and "Atomic data for opacity calculations") and data management, the latter 
being the main topic of the present review. 

Radiation transfer in the stellar envelope, with typical temperatures T(K) and densities 
p{g cm~^) in the ranges 

4.5 < logT < 6.5 and - 8.5 < logp < -1.5 , (1) 

can be modeled in local thermodynamic equilibrium. Hence, the specific intensity is not very 
different from the Planck function B{T), and the total radiant energy flux is assumed propor- 
tional to the temperature gradient (diffusion approximation): 
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The transfer is thus controlled by the Rosseland mean 



VT . (2) 
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where g{hv/kT) is the weighting function 

15 

9{u) = ^"^^^P(-")il - exp(-M)]"2 ^ (4) 

and the monochromatic opacities include contributions from all the radiative absorption 
processes in the plasma. It must be pointed out that the Rosseland mean is a weighted harmonic 
mean that is not additive, i.e. all the contributions to the monochromatic opacity must be added 
up before the integral in Eqn. ([3]) is computed. It also weighs the windows between absorption 
lines rather that the strong absorption lines themselves, and therefore, accurate representations 
must be considered for the line wings, the weak lines and absorption lines originating from minor 
constituents of the plasma. 
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The required radiative atomic data, namely level energies, /-values, photoionization cross 
sections and line broadening parameters, were computed in the OP with the close-coupling 
method [36], where the wave function for a state of an ionic A^-electron target and a colliding 
electron with total orbital angular momentum, spin and parity SLtt is expanded in terms of 
the target eigenfunctions 



The functions Xi ^^e vector coupled products of the target eigenfunctions and the angular 
components of the incident-electron functions, -Fj(r) are the radial part of the latter and A is 
an antisymmetrization operator. The functions $j are bound-type functions of the total system 
constructed with target orbitals, introduced to compensate for orthogonality conditions imposed 
on the Fi{r) and to improve short-range correlations. The Kohn variational principle gives rise 
to a set of coupled integro-differential equations which were solved in the OP with the R-matrix 
method [211 1221 |23l |35] and a series of sophisticated asymptotic codes developed by Seaton |121] 
to improve and accelerate the calculation of bound-state and continuum energies and wave 
functions, and hence, the radiative data. For instance, it became possible, for the first time, to 
obtain all possible bound states in an energy range by scanning rather than iterating around 
initial energy guesses. It is worth emphasizing, on the other hand, that in the OP i?-matrix 
computations relativistic effects were neglected and LS coupling was assumed, so fine-structure 
splittings were not resolved. The rationale behind this approximation was that line broadening 
would blur such splittings at the expense of neglecting the weak intercombination transitions 
that arise from relativistic spin mixing. 

Data computations proceeded by assigning complete isoelectronic sequences to specific re- 
search groups which had to report periodically on their progress at OP meetings. Atomic data 
curation in the OP was mostly managed on half-inch tapes and, in some cases, on the newer 
Exabyte 8-mm cassettes and CDs and was centralized by Seaton. He developed stringent test- 
ing procedures in order to ensure data accuracy and completeness and to remove nonphysical 
resonances in the photoionization cross sections, particularly in the difficult energy regions be- 
low thresholds. In many cases calculations were repeated due to programming bugs or poor 
target representations. The final opacity computations [127] were performed in the difficult 
transition period between the twilight of the mainframes and the advent of the Unix worksta- 
tions, so they implied familiarity with different operating systems (IBM VM/CMS, VAX/VMS 
and Unix) and fortran compilers, the end product being a CD that was publicly available from 
two sites: the Observatoire de Meudon, Paris, France, and the Department of Astronomy, Ohio 
State University, Columbus, OH, USA. 

One of the most interesting findings in the OP was obtained by data analysis, namely in the 
photoionization cross sections of excited states. The most salient features in such cross sections 
were broad resonances that involved the photoexcitation of the core rather than that of the 
active electron, and were consequently labeled PEC resonances. In Figure [H we show the OP 
photoionization cross sections [58] of the Isras ^S excited states in He I for n = 2—4, where 
radiative absorption can give rise to the excitation of a series of 2pn's resonances converging to 




(5) 



3 



4 



Figure 1: OP photoionization cross sections of the Isras excited states [n = 2—4) in He I as a 
function of the effective quantum number E = —\jv^. It may be seen that the cross section of 
each state is dominated by a broad PEC resonance which is assigned the term 2pn's ^P° with 
Ti = n. In PEC resonances, the active electron does not take part in the radiative absorption, 
and hence, its width is determined by the Is — )■ 2p core transition. Reproduced from Figures 7 
of Ref. ^ (http://dx.doi.org/10.1088/0022-3700/20/23/032) . 

the 2p threshold 



The PEC resonances arise when n' = n since the active electron does not participate in the 
radiative excitation which takes place solely via the Is — ?■ 2p core transition; therefore, the 
PEC resonance widths are mostly determined by this core process and are then practically 
independent of n. 

In the lengthy revisions, the first preliminary opacities were not published [77j by OP but 
by OPAL, a comparison of the Los Alamos and OPAL monochromatic opacities for Fe which 
we show in Figure [21 A huge bump may be seen at around 60 eV in OPAL which is absent 
in Los Alamos; it is due to an unresolved transition array involving numerous states of the 
type 3s^3p^3d^ in ionic species of the second row, namely in A1-, Si-, P-, S- and Cl-like Fe. 
As mentioned by the authors, this bump would greatly enhance the Rosseland mean since its 
distribution function (see Eqn. H]) peaks at around 80 eV. Furthermore, this outcome became 
a key milestone in the opacity race; firstly, it was an incentive for the two competing teams 
since Simon's hypothesis appeared to be correct and new revised opacities were indeed required; 
and secondly, the large number of 3s^3p^3d^ states could not be handled computationally at 
the time with the i?-matrix approach. Within the OP, the latter situation led to additional 
extensive calculations of /-values with the atomic structure code SUPERSTRUCTURE [55] 
which uses simpler configuration-interaction wave functions of the type 



These new large data sets came to be known as the "PLUS data" |85j . 



In spite of using very different approaches for the equation of state and the computation of 
the atomic data, the independent OP and OPAL opacities turned out to be, after more than a 
decade of heavy computations, in outstanding agreement and have been extensively compared. 
The first comparisons were performed of course in the context of pulsation calculations, namely 
in B stars |108] and beat and bump Cepheids and RR Lyrae stars ^79j, where the new OPAL and 
OP opacities generally contributed to improved and undifferentiated models. At temperatures 
of ~10^ K characteristic of stellar interiors, the OPAL and OP opacities were in close agreement 
[TT] and larger than previous values by as much as a factor of 3. However, at higher temperatures 
and densities, such as those found in the deeper layers, OPAL was higher than OP by around 
30% which was suggested [76] to be due to the neglect of inner-shell transitions in the latter. 
This proposition was subsequently confirmed [Hj by computing new inner-shell radiative data 



Isns + 7 — )■ 2p?2 s — )■ Is + e 
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Figure 2: Comparison of Los Alamos monochromatic opacities for Fe with OPAL. Left paneh 
photon absorption coefficient by Los Alamos code (1985 version) at a density of 6.82 x 10^^ 
g cm~^ and a temperature of 20 eV. Right panel: OPAL results where a huge unresolved transi- 
tion array is observed at around 60 eV due to numerous 3-to-3 transitions not taken into account 
by Los Alamos. Reproduced from Figures 1-2 of Ref. [77j (http:/ /dx.doi. org/10. 1086/185034) . 

for a mixture of six elements (H, He, C, O, S and Fe) with AUTOSTRUCTURE ^ i8j, an 
extended version of the atomic structure code SUPERSTRUCTURE [55] . 

A more detailed comparison |126j of OPAL and OP opacites yielded some significant dif- 
ferences. For instance, although there was good global agreement for hydrogen, in the region 
of logT ~ 6 (T in K) and logi? ~ —1, where R = p/T^ [p is the mass density in g cm~^ and 
Tq = 10~^ X T), the OPAL Rosseland mean was larger than OP by up to 13% which was mainly 
caused by the different equations of state. (For a discussion on the validity of the two equations 
of state, see Ref. [HI 1140] .) Similar differences occurred for He at logT ^ 6.4 and logi? ~ —1. 
It was also found that the inclusion of intercombination transitions in iron increased the Rosse- 
land mean by 18%, and in iron-rich mixtures, the OP "Z-bump" was located at somewhat 
higher temperatures when compared to OPAL. Generally speaking, there are small differences 
between OPAL and OP at the lower temperatures (logT < 5.5) which are believed to be due 
to atomic data quality, OP certainly having the upper hand. At the higher temperatures and 
densities (logi? > —2), OP is somewhat larger due to a different equation of state. 

These conclusions led to a major revision of the OP opacities ^U\. The outer-shell iron 
data |85l 1126] obtained in intermediate-coupling with AUTOSTRUCTURE for ionic species 
with electron number N = 13-18 were now included, plus inner-shell radiative data for the 
chemical elements He, C, N, O, Ne, Na, Mg, Al, Si, S, Ar, Ca, Cr, Mn, Fe and Ni. An improved 
frequency mesh was also introduced to ensure a high degree of accuracy in integration (better 
than 1%). The contributions of inner-shell transitions may be appreciated in Figure |3] in 
the high-temperature (logT > 5.5), high-density (logR > —3) tails. Moreover, as shown in 
Figure HI the OP revised opacities agreed with OPAL to better than 10%, but again, the OP 
Z-bump at logT ^ 5.2 was shifted to slightly higher temperatures. 

An interesting development took place while the OP opacities were being improved. By 
means of a time-dependent, three-dimensional, hydrodynamical model [B] of the solar atmo- 
sphere, the C, N, O and Ne solar abundances were revised downward to yield a photospheric 
metal mass fraction oi Z = 0.0126, a value considerably lower than the widely accepted value of 
Z = 0.0194 derived [H |66] from standard solar abundances. The new estimates not only sadly 
destroyed the outstanding agreement between solar models and helioseismological observations 
in relation to the depth of the solar convection zone, respectively 0.714 R© and 0.713±0.001 R©, 
with a larger value (0.726 R©), but it also led to discrepancies with the measured sound speeds 
and the surface helium abundance [12]. It was suggested pj] that an increase of 11% in the 
OPAL opacities from the base of the convection zone (0.7 R©) down to 0.4 R© would solve the 
problem. Thus, there was much expectation for the OP revision to confirm this hypothesis, 
but the final OP increase was less than 2.5% more than OPAL in the region of interest [TU] . 
Possible sources of "missing opacity" in the context of the solar abundances problem have been 
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Figure 3: OP Rosseland-mean opacities for a solar mixture (S92) with and 
without inner-shell transitions. Reproduced from Figure 1 of Ref. [TO] 

(|http://dx.doi.org/10.1111/j.l365-2966.2005.0899r^. 



Figure 4: OPAL and OP Rosseland-mean opacities for a solar mixture (S92) where 
an excellent overall agreement is found. Reproduced from Figure 2 of Ref. flU\ 
( .http://dx.doi.Org/10.llll/j.1365-2966.2005.08991.xj . 

recently proposed [111] , where the neglected intercombination transitions and relatively simple 
target representations used (only the ground complex was considered in most cases) in the OP 
are top of the list. 

The computation of vast numbers of /-values, photoionization cross sections, line broaden- 
ing parameters and monochromatic opacities in the OP allowed the determination of radiative 
accelerations which can be used to study diffusion processes in stars |2]. Diffusion can be the 
cause of surface abundance anomalies in, for instance, chemically peculiar stars (e.g. HgMn 
stars). Seaton |122j used the OP radiative data to study the phenomenon of radiative levi- 
tation and to compute envelope radiative accelerations for 15 elements |123j . making publicly 
available a useful suite of fortran utilities for estimating radiative acceleration parameters. He 
also treated in detail |124j the envelope diffusion of iron-group elements in HgMn stars which 
produced changes in the Rosseland-mean opacities by as much as a factor of 4, particularly 
producing build-ups of iron in the region of logT ~ 5.1 and of nickel in the outer parts of the 
envelope. A comparison of Mn abundances obtained from observations in the UV [130] and 
optical pp of HgMn stars with those calculated at the bottom of a model atmosphere (optical 
depth r = 1) and after 10^ years showed satisfactory agreement. It has been argued 08j that 
radiative accelerations would enable further tests of OP and OPAL opacities by comparing 
such observed surface abundance anomalies with those predicted in diffusive stellar models. 
Significant differences in the OP and OPAL accelerations have been recently discussed [111] . 

TOPbase 

Most of the work by one of us (CM) in the OP was carried out while he was a Research 
Consultant at the IBM Venezuela Scientific Center and a Visiting Fellow of IVIC. This allowed 
him to be in contact with other scientists involved in scientific computing including computer 
scientists. Thus, when the bulky OP atomic data sets were finally being gathered and tested, 
he was encouraged by this circle to become acquainted with the field of data dissemination. 

In those days, laboratory astrophysicists were familiar with the atomic data compilations 
published by the US National Bureau of Standards (now the National Institute of Standards 
and Technology), of which the most well-known were Atomic Energy Levels plj and Atomic 
Transition Probabilities |141] . but there was scanty contact with online atomic databases or com- 
mercial database management systems (DBMSs). Early experiences, e.g. the Belfast Atomic 
Data Bank [24j, were mainly concerned with data reservoirs and bibliographies for data as- 
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sessment within the context of atomic data workshops \57\ 1144] rather than with user data 
products. 

Thus, taking into consideration that the OP atomic data were hkely to be employed in a 
variety of research fields and for different purposes, there was at the time a definite motivation 
to enhance their access and usage by developing a versatile and searchable atomic database. 
However, the diversity of scientific environments, resources and operating systems implied, right 
from the onset, that the proposed DBMS would have to be custom-developed in the scientific 
computing lingua franca: fortran 77. In this respect, we were able to count with the enthusiastic 
support of Walter Cunto, a first class computer scientist at the IBM Scientific Center, who was 
responsible for most of the design of what became to be known as "TOPbase" |15], i.e. the 
definition, development, distribution and maintenance of a specific, portable, efficient and low- 
cost DBMS to facilitate the intensive use (online and batch) of the OP atomic data sets. 

The original version of TOPbase [15] handled around 0.5 GB of compact binary data files, 
organized to speed up searches along spectroscopic series and isonuclear and isoelectronic se- 
quences and the sorting of energies and wavelengths. User searches were specified by means 
of a simple command-based query language which included graphic processing. Its structure 
essentially contained three entities — term energies (e), /-values (/) and photoionization cross 
sections (p) — and a set of indexes, all resident in secondary storage (disk). When TOPbase 
was invoked, the indexes were loaded into main memory (RAM) such that the query parser 
could use them to streamline data searches. Index displays could also be employed to obtain 
tables of content of the database. The main bottleneck in system performance was, due to the 
data volumes involved, data loading from secondary storage. As a consequence, the data manip- 
ulation scheme was implemented at two levels: (i) searches in and time consuming block data 
retrievals from secondary storage; and (ii) cheap and versatile processing (sorting, row/column 
selection, exclusion and plotting) performed iteratively in main storage in order to satisfy the 
user's ultimate requirements. 

The TOPbase data model is shown in Figure O Data compactness and fast access were 
two main features whereby main and secondary storages were managed jointly with the logical 
handling of the data. Two data structures were therefore implemented in main storage: the 
view and the table. A search was performed according to a user selected criterion that 
generated a subset of highly cohesive data, i.e. a view, which was loaded into special buffers 
located in main memory with the cv command to allow further manipulation. Each view 
had an associated descriptor to register selection criteria and view bounds which could be 
displayed on the monitor at any time. A view binary image could be respectively archived in 
or retrieved from disk with the ar and re commands, and the package offered data streaming 
facilities to different output devices, namely the monitor, printer and a disk file through the dv 
command. Logical reorganizations of data stored in a view were possible through the table 
structure. A table was a vector array that enabled or disabled data items contained in the 
view as a result of user selection requirements such as se (selection), ex (exclusion) and so 
(sorting). A table could also be output with the dt command on different devices (monitor, 
printer or disk file), and graphic displays of its columns (pt) and photoionization cross sections 
(px) could be plotted. Tables of content of the database, which were useful before creating 
views, could be obtained by displaying the indexes (di), and a set of atomic constants (e.g. 
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Figure 5: TOPbase data model showing the two main data structures, the view and table, the 
display and graphic capabilities and query commands. Reproduced from Figure 2 of Ref. [IS] 
with permission of ®RevMexAA. 



Z-dependent Rydberg constants) were also available with the dc command. 

Although TOPbase was originally developed such that it could be installed locally on any 
system with a fortran 77 compiler and distributed on tape, we soon became aware that in 
academic environments the rapidly spreading Internet allowed remote access in addition to the 
popular electronic mail. Thus a more efficient scheme was to install the database at a central 
site, from which it could be invoked online with network commands such as telnet and ftp. 
This arrangement was enthusiastically promoted by Claude Zeippen (an active OP participant 
at the Observatoire de Paris, France) who proposed the Centre de Donnees astronomiques de 
Strasbourg (CDS@) convenient host. With much appreciated technical assistance from 
Frangois Ochsenbein, TOPbase became operable [16] at the CDS in January 1993 and has 
remained so up to the present day. 

With the appearance of the World Wide Web in the mid 90s, TOPbase became a handy 
case study in the new technology. A web-based user interface was developed in 1995 by one 
of us (CM) at IVIC with the assistance of Jesus Quiroz, and included soon after in the CDS 
home page. The current TOPbase web servicJ^ is an abridged version of the command-based 
system described above, particularly the versatile table structure was not implemented due 
to difficulties in postprocessing data after they have been streamed by the web browsers. The 
Web allows estimates of access frequency, and in the case of TOPbase, the General Internet 

^^http : / / cds . u-strasbg . f r/ 

^ ^|http : / / cdsweb . u-strasbg . f r/topbase/topbase . html 
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Search Engine for Atomic Data (GENIEO) provided consistent monthly statistics of over 100 
hits in the last few years. 



OPserver 

In the early days of TOPbase development at the IBM Venezuela Scientific Center, lengthy 
discussions with Walter Cunto on atomic databases [15] always considered two access modes: 
interactive online access by a user and recurrent access in batch processing by an application. 
Although the former mode has grown since then, particularly with the ubiquitous establishment 
of the Web, the latter has remained in the background. However, in the new cyber-infrastructure 
of e-science, characterized by the mining of large volumes of data in distributed reservoirs, it is 
expected to take a leading role. In this same context, we predict that the deployment of data- 
intensive applications, which at present implies a local download, installation and periodic 
upgrading, will change at its roots. 

The development of the OPserver [22] was carried out with these ideas in mind, but above 
all, with the contributions of several students and postdocs: Marcio Melendez (Universidad 
Simon Bolivar, Caracas), Juan Gonzalez and Enrique Palacios (Universidad de Carabobo, Va- 
lencia), Luis Rodriguez (IVIC, Caracas), Franck Delahaye (Observatoire de Paris), and the 
technical support, expertise and encouragement of Paul Buerger (Ohio Supercomputer Center, 
Columbus), Alberto Bellorm (Universidad Central de Venezuela, Caracas), Anil Pradhan (Ohio 
State University, Columbus), Claude Zeippen (Observatoire de Paris) and Mike Seaton himself. 

With the last revision [10] of the OP opacities to treat the inner-shell contribution, a com- 
plete set of monochromatic opacities and a suite of simple-to-use codes to compute means and 
radiative accelerations (OPCD_2.10) was released [125] . This was, in our opinion, a major 
achievement as it represented the culmination of a 20-year endeavor embodied in a user data 
product. However, as mentioned before (see the section on The Opacity Project), some sophis- 
ticated stellar models take into account microscopic diffusion processes (radiative levitation, 
gravitational settling and thermal diffusion) which means that chemical element stratification 
depends on stellar depth. This implies that mean opacities and radiative accelerations must 
be recalculated at each depth point of the model and at each time step of the evolution, and 
thus codes more efficient than 0PCD_2.1 may be necessary. OPserver attempted to enhance 
0PCD_2.1 by: (i) storing the monochromatic opacities permanently in main memory; (ii) tran- 
scribing the codes as a subroutine library that can be linked to a stellar modeling code; and 
(iii) offering several access modes. 

In 0PCD_2.1, Rosseland-mean opacities and radiative accelerations are computed in two 
stages (see Figure [6]). In a time-consuming stage and for a user-defined chemical mixture, the 
MIXV and ACC codes read the monochromatic opacities from disk (~1 GB) and compute 
means and accelerations on a representative tabulation of the complete T — plane, where 
T is the temperature and A^^e the electron density. In a second fast stage, the means and 
accelerations are interpolated with OPFIT and ACCFIT on a stellar depth profile specified by 

^'http : //www-amdis . iaea.org/GENIE/ 

^ ^|http : //cdsweb . u-strasbg . f r/topbase/ op . html 
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Figure 6: Rosseland-mean opacities (RMO) and radiative accelerations (RA) are computed 
with the codes in the 0PCD_2.1 release in two stages: in a time consuming Stage 1, they 
are computed for the whole (T, Ng) plane, followed by fast bicubic interpolations in Stage 2. 
Reproduced from Figure 1 of Ref. [92] ( ihttp://dx.doi.org/10.1in7jT365-2966.2007.11837.x| ). 



Figure 7: OPserver enterprise showing the web-server-supercomputer tandem at the Ohio 
Supercomputer Center (OSC) and the three available user modes. Reproduced from Figure 2 
of Ref. [92] ( Ihttp://dx.doi.org/10.11ll7ri365-2966.2007.11837.x) . 



the user. In OPserver, these computations are improved by having a dedicated server in the 
first step, where the monochromatic opacities are always stored in RAM, and by parallelizing 
the loop over the chemical mixture. 

In Figure [TJ we show the access modes offered by the OPserver which has been resident 
at the Ohio Supercomputer Center (OSC), Columbus, OH, USA, since 2006. It has been 
implemented under a client-server architecture where the web server (client) communicates 
with a supercomputer (server) via a socket interface. The 1 GB monochromatic opacity data 
sets are always loaded in main memory on the server. OPserver may be accessed by a user 
(client) from the interactive web servecj (Mode C); by downloading the package locally from 
the CDS (OPCD_3.30), installing it and then linking the subroutine library (OPlibrary) to 
the user modeling code (Mode A); or by just downloading and linking the OPlibrary, and then 
accessing the data remotely at runtime from the central facility at the OSC (Mode B). The 
latter mode has been designed for the new distributed grid computing environment^ where 
Mode A would be cumbersome due to data volumes and deployment times. 

The OPserver has recently been tested in innovative laboratory opacity experiments [13], 
where a sample is heated with X rays and the spectral transmission measured with a back-light 
in a high energy density facility. These initiatives have been motivated by the solar abundance 
problem described above (see the Opacity Project section) and due to the dependence on 
opacities from light chemical elements in inertial fusion and Z pinches. 



The Iron Project 

By the end of the OP, most of the researchers that had been involved in this consortium felt that 
they wanted to go on, to get involved in a second project of comparable scale and relevance. 
The proposed challenge, the Iron Project (IP) [71], was just as ambitious: the systematic 
computation of accurate radiative and electron impact excitation rates for ions of the iron 
group of the Periodic Table, which were required to interpret a wide variety of astronomical 
spectra where local thermodynamic equilibrium in the emitting plasmas cannot be assumed. 
The IP work has been published in a long series of over 60 papers ("Atomic data from the 



^"http : / / opacities . osc . edu 




^""http : / / cdsweb . u-strasbg . f r/topbase/ op . html 


(http : //www . gridcomputing 


. com/ 
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IRON projecf'o) involving the old OP research groups plus new collaborating institution^^ 
(specially from Canada) under the general coordination of David Hummer. 

A main difference with OP was that fine-structure levels would now have to be consid- 
ered, i.e. atomic data in intermediate coupling would become the end products. In the case 
of radiative rates for both allowed and forbidden transitions, the familiar structure codes SU- 
PERSTRUCTURE and CIVS [72] could compute accurate fine-structure A-values for light 
elements {Z < 36, say) with configuration-interaction expansions of the type 



and by introducing relativistic corrections with the Breit-Pauli Hamiltonian 

-f^bp = + i/lb + (9) 

where H^^ is the usual non-relativistic Hamiltonian. The one-body relativistic operators 



N 



Hi^ = /n(mass) + /„(d) + /„(so) (10) 



n=l 



represent the spin-orbit interaction, /„(so), the non-fine-structure mass variation, /„(mass), 
and the one-body Darwin correction, /n(d). The two-body Breit operators are given by 

H2h = "Y fi'nm(so) + fi'„m(ss) + ^(css) + 5'„r„(d) + fi'„m(oo) (11) 
n<m 

where the fine-structure terms are gnm{so) (spin-other-orbit and mutual spin-orbit), gnm{ss) 
(spin-spin), and the non-fine-structure counterparts gnm{css) (spin-spin contact), gnm{d) (two- 
body Darwin) and gnm{oo) (orbit-orbit). 

On the other hand, the electron impact excitation rate for a transition i — / is conveniently 
given in terms of the dimensionless effective collision strength 



POO 

Tif{T)= / nif{Ef)eM-Ef/kT)d{Ef/kT) (12) 
Jo 



which constitutes the thermal average of the collision strength for the transition, namely 

^if = 2^9\Sif-5if\^ (13) 

where S is the scattering matrix containing the micro-physics of the scattering and g = 2J + 1 
is a statistical weight. The excitation cross section may be written in terms of the collision 
strength as 

= ^^/TtJ , (14) 



http : // cdsweb . u-strasbg . f r /t ipbase/ ref / publi . html 
^ ^http V/www ■ usm ■ Imu . de/people/ ip/ institutions . html 
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kf being the incident electron energy in Rydberg units and Qi = 2Jj + 1, and the complex 
scattering matrix is usually derived in terms of the real reactance matrix 

1 + iR 

S = -— (15) 

which is in fact the quantity that is actually computed. At the time of the IP launch, the way 
to obtain R matrices in intermediate coupling was to compute them in LS coupling, and then 
perform the algebraic transformation 

R'^{SiLiJikK,, SfLfJflfKf) = R^'-^^S.L.ks,, SfLjlfSj) 

SL (16) 
C{SLJ, SiLiJ,, hK,)C{SLJ, SfLfJf, IfKf) 

with a code such as JAJOM |114l 1115"] . C{SLJ, SiLiJi,liKi) are Clebsch-Gordon coefficients 
and SiLiJi and SLJ are the orbital angular momentum, spin and total angular momentum 
quantum numbers of the target (in state i) and of the total system (target + electron), respec- 
tively; li and Si are the orbital angular momentum and spin quantum numbers of the electron, 
and K is an intermediate quantum number such that 

Ki = Ji + h and J = Ki + Si. (17) 

Some allowance for relativistic effects was made [T^ by the formalism of term- coupling coeffi- 
cients and by including the non-fine-structure mass variation and one-body Darwin corrections 
of Eq. (|TOl) in the Hamiltonian of the i?-matrix method. Some of the earlier work in the 
IP was carried out with this approach; however, a more formal treatment of the relativistic 
close-coupling approximation was needed for some of the complex Fe ionic species. 

Although the Breit-Pauli relativistic Hamiltonian had long been included in i?-matrix code 
|119[I120] . the two-body Breit corrections of Eq. f|TT]) were not taken into account. Nevertheless, 
by merging this improved relativistic version with the OP non-relativistic package, an efficient 
Breit-Pauli i?-matrix suite (BPRM) began to be generally used in the IP as from 1995 for 
both collisional and radiative work; e.g. in the electron impact excitation of the Sd"^ ^Dj 
ground-state fine-structure transitions in Ti-like ions [20]. It was found that, for the Sd'' ^Dq — )■ 
3d^ ^Di transition in Fe V, the BPRM effective collision strength at 10^ K was 35% below 
that computed with JAJOM. However, it was soon realized that BPRM led to computationally 
much more demanding runs where the number of levels of a target representation could be 
limited by processor core size, thus making some of the complex Fe species computationally 
intractable. For this reason, the Intermediate Coupling Frame Transformation method (ICFT) 
was developed [67] (outside the IP but quickly incorporated), whereby relativistic effects may 
be included via frame transformations based on the multi-channel quantum defect theory. For 
instance, for 0-like Fe XIX, the original IP calculation [30] with BPRM included 92 levels in 
the target representation while a revisit [38j with ICFT allowed up to 342 target levels. 

Progress in the IP was translated with time into many improvements to the i?-matrix 
codes and the establishment of useful numerical utilities in computational atomic physics. For 
instance, since the infinite summation of Eq. (fT3|) converges very slowly for allowed transitions. 
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Figure 8: Infrared lines originating from ions with open-shell configurations np, np^, np^ and 
np^ respectively belonging, for n = 2, to the B, F, C and O isoelectronic sequences, and for 
n = 3, to the Al, CI, Si and S sequences. 

Figure 9: Collision strength for the 2s^2p^'^Po — > ^Pi infrared transition in 0-like Al VI 
displaying a bundle of broad resonances just above threshold. Reproduced from Figure 2 of 



Ref. [39] ( |http://adsabs.harvard.edu/abs/1994A| ^o26AS..108....1B). 



a "top-up" procedure based on the Burgess sum rule [37] was incorporated in BPRM. The 
inclusion of the two-body Breit corrections (see Eq. [TT]) has also been recently carried out 
by G. X. Chen, W. Eissner & A. K. Pradhan (in preparation). Furthermore, a graphical 
method [31] for analyzing the partial-wave convergence of effective collision strengths based on 
the reduction of the infinite temperature scale to a finite interval (0 < T < 1) became popular 
among IP members. However, it relied on the infinite-temperature value of the effective collision 
strength which, for allowed transitions, may be easily estimated from the /-value. For forbidden 
transitions, on the other hand, it is estimated from the Coulomb-Born limit [33] which was not 
available in the earlier years of the project, and was eventually coded into SUPERSTRUCTURE 
by W. Eissner (unpublished). 

The first stage of the IP was concerned with the calculation of electron impact excitation 
rates for astrophysical abundant ions with open-shell configurations np, np^, np^ and np^ which 
give rise to infrared (IR) lines (see Figure [8]). The following isoelectronic sequences were studied: 
for n = 2, the B-, F-, C- and 0-like ions; and for n = 3, the A1-, C1-, Si- and S-like ions. These 
data were required for the modeling of spectra observed with the Infrared Space Observatory 
(ISO0) launched in November 1995, and after August 2003, with the Spitzer Space Telescope 
(SpitzeiQ). A delicate situation emerged in the electron excitation of IR transitions which 
is illustrated [39] with the 2s^2p'^^Po ^Pi transition in 0-like Al VI (see Figure [9]). For 



^^httpT// iso . esac . esa . int/ 
,http : / / spitzer . caltech. edu7| 
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Figure 10: Effective collision strength for the 2s^2p^ ^P3/2 ~^ ^^1/2 transition in F-like Ar X 
(a) using theoretical level separation and (b) using experimental level separation. Differences 
around a factor of 2 are observed at logT = 4. Reproduced from Figure 5 of Ref. |116] 
(|http://adsabs.harvard.edu/abs/1994A^o26AS..107...29S). 



Figure 11: Effective collision strength for the 3s^3p ^P°/2 ~^ ^-^3/2 transition in Al-like Fe XIV 
in the region below the 3s3p^ excited levels showing a dense resonance structure. Reproduced 
from Figure 2 of Ref. [133j (http://adsabs.harvard.edu/abs/1996A. 7o26A... 309. .677S). 



high-charge species (e.g. Al VI), a bundle of broad resonances that belong to series converging 
to the levels of the first excited configuration, 2s2p^, sits just above the reaction threshold. 
As a consequence, the effective collision strength at the lower temperatures is not only greatly 
enhanced but also becomes very sensitive to target-level separations [116] (see Figure [TU]) : i.e. 
experimental target levels must be used in the i?-matrix runs and the convergence of the close- 
coupling expansion ([5]) with respect to target levels is crucial. An extreme case |133j is the 
3s^3p ^P°/2 ~^ "^^3/2 transition in Al-like Fe XIV which, as shown in Figure [TTl displays a 
dense resonance structure arising from excited levels from the n = 3 and n = 4 complexes. 
An important feature here is the choice of energy mesh interval which must be fine enough to 
resolve the resonances; in the region between the 3s^3p ^P° and 3s3p^ ^P thresholds, an energy 
mesh with 10,500 points was used |133j . Finally, a compilation [9J of IP A- values and effective 
collision strengths for interpreting IR transitions in nebulae was published in 2006. 

The second stage of the IP was concerned with the computation of electron impact excitation 
rates for transitions within the n < 3 complexes and, when possible, n < 4 in ionic species of 
the complete Fe isonuclear sequence. These data are mainly needed in the interpretation of the 
EUV solar spectrum, and proved to be quite an ambitious objective because for some systems 
— e.g. second-row ions with open n = 3 shells such as the A1-, Si-, P-, S-, Cl-like species — the 
calculations proved to be very large indeed due to the slow convergence of the close-coupling 
expansion with respect to target levels and the need to include relativistic effects. In fact, 
some of them, e.g. the Fe XI (S-like) and Fe XIII (Si-like), have not as yet been published. 

A topical example of this second stage is the electron impact excitation of the ground state of 
Al-like Fe XIV. In this calculation |134j . the target representation included 18 LS terms arising 
from the configurations 3s^3p, 3s3p^, 3s^3d, 3p^ and 3s3p3d with the addition of 17 correlation 
configurations containing n/-orbitals with n < 4. Intermediate-coupling collision strengths were 
obtained with the algebraic transformation of Eq. (ITB]) including relativistic target effects via 
term-coupling coefficients. Previous work ^3\ was carried out with the close-coupling method 
(i?-matrix) using a reduced three-configuration target representation, and in the distorted wave 
approximation [26l [27] which neglects channel coupling. In this respect, it might be seen in 
Figure [12] that, although the effective colhsion strengths for the 3s^3p ^P°/2 ~^ 3s3p^ ^05/2 
transition computed with the three methods agree at high temperatures (logT ^ 7), at the 
lower temperatures (logT 5.5) the IP is a factor of 2 higher; this enhancement is caused 
by an adequate treatment of resonances by the IP. Furthermore, as depicted in Figure [13] and 
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Figure 12: Effective collision strength for the 3s^3p ^P°/2 ~^ 3s3p^ ^D5/2 transition in Al- 
like Fe XIV. Solid line: IP close-coupling calculation [134J. Open circles: independent close- 
coupling calculation [53] with a reduced target representation. Dashed line: distorted wave 
calculation |26] that neglects channel coupling. Reproduced from Figure 2 of Ref. |134] 
dhttp: / /dx.doi.org/10. 1051 /aas:2000319| ) . 

Figure 13: Density variation of the 274A/270A line ratio in Fe XIV. Solid line: IP work |134j . 
Dashed line: CHIANTI vl.O using previously computed [53] close-coupling effective collision 
strengths. Asterisks: distorted wave data [26l [27]. Reproduced from Figure 4 of Ref. |134] 
dhttp: / /dx.doi.org/10. 1051 /aas:20003r9 ) . 



in contrast with IP, previous work resulted in a density insensitive 274A/270A line ratio which 
was astronomically observed to vary between 1.3 and 2.3. The IP variation was found to be 
between 1.0 and 2.0 in good agreement with observations. 

It may be assumed that resonances enhance the effective collision strength mainly at the 
lower temperatures but this is not always so. In such detailed study of the whole reso- 

nance structure and the inclusion of all the contributing thresholds in the target representation 
are required. This problem is illustrated [25j with the excitation of the 2pi/2 — ^ 2p3/2 transi- 
tion in Li-like Fe XXIV (see Figure [T^ where a huge bump with respect to a distorted wave 
result |145j occurs at around log T ^ 7. Moreover, estimating the right number of contributing 
thresholds for the 3s^3p'' ions (g = 2—4) has become — as experienced [281 ISH EQl 1132] with 
Fe XII (g = 3) — a long ordeal; however, as targets were improved, long-standing discrepancies 
with observations were progressively ironed out [2H1 1132j . 

An interesting calculation |109] has been the low energy electron impact excitation of neutral 
iron (Fe I) since most of the IP work involved positively charged targets. An important effect 
here was to take into account the polarizability of the terms included in the close-coupling 
expansion, namely ^D, ^F and ^F, by considering (at least for the first two) four polarized 
pseudo-states (^P°, ^D°, ^F° and ^G°). A second challenge was the threshold behavior of the 
collision strength for a neutral, Q{E) — )■ as £^ — 0, which differs from that for an ionic target 
{^{E) — )■ r^th as -E — 7- 0), and obeys the Wigner threshold law 

Q{^, f) ^ E'+'/' (18) 

where / is the dominant orbital angular momentum contribution. Compliance with this law 
may be appreciated for the case of the ^D4 — )■ ^Da transition shown in Figure [T5| where at very 
low energies {E < 5 x 10^^ Ryd), the s-wave (/ = 0) dominates while at higher energies the 
p-wave (/ = 1) takes over. 

Figure 14: Effective collision strength for the 2pi/2 ~^ 2p3/2 transition in Li-like Fe XXIV. Full 
line: IP [25] • Broken line: distorted wave result |145] . The IP data show a huge bump 
at logT fti 7 due to the resonance contribution. Reproduced from Figure 9 of Ref. [25] 
dhttp: / /dx.doi.org/10. 1051 /aas: 199738"4| . 
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Figure 15: Low-energy behavior of the colhsion strength Q{^D4,^ D3) in Fe I showing comphance 
with the Wigner threshold law. Full curve: IP close-coupling calculation |109j . Dotted line: 
Wigner law for s-wave (1 = 0). Dashed line: Wigner law for p-wave (/ = 1). Reproduced from 
Figure 1 of Ref. [IDH] (lhttp://dx.doi.org/10.1051/aas:1997328l). 



Figure 16: Photoionization cross section of the 3d^4s^ ^D ground state of Fe I where the 
contributing thresholds are denoted. Full curve: IP close-coupling result |1^. Broken curve: 
calculation |81j based on many-body perturbation theory. Reproduced from Figure 2 of Ref. [H] 
(|http://dx.doi.org/10.1051/aas:199732"7|). 



Massive data sets of relativistic radiative data for both allowed and forbidden bound-bound 
transitions were also computed in the IP for several Fe ions (e.g. Fe XVII in Ref. [96]), increasing 
previously available data sets for such ions by orders of magnitude. For allowed transitions, 
the BPRM package was used where level identification became an issue, greatly simplified by 
the development of a new level identification algorithm. For the forbidden type (E2, E3, Ml 
and M2), the SUPERSTRUCTURE atomic structure code was employed. Photoionization 
cross sections for bound-free transitions were also computed with BPRM for some species, an 
oustanding case fT^j was Fe I for which cross sections were computed for 1,117 levels with n < 10 
and / < 7. The Fe II target representation in this large calculation required 52 LS terms from 
the configurations 3d 4s, 3d , 3d 4p, 3d 4s and 3d 4s4p. In Figure [16] the photoionization 
cross section of the ground state of Fe I from this work is compared with a previous theoretical 
estimate [81] . 



TIPbase 

The TlPbasj^ database contains the fine-structure atomic data computed in the IP, namely 
level energies, radiative transition probabilities (A-values) and electron impact excitation cross 
sections (collision strengths) and rates (effective collision strengths) for fine-structure transi- 
tions. Its DBMS was developed in fortran 90 and its web user interface in javascript, the latter 
by Jesiis Quiroz (IVIC) and Marcio Melendez (Universidad Simon Bolivar). Among its most 
interesting features are extensive data documentation and the possibility to display and manip- 
ulate plots interactively with a Java applet, which may be used to study trends and resonances 
in the excitation cross sections. Efforts have been made to include the bulky data sets contain- 
ing collision strengths, which allow the computation of rates for non-Maxwellian distributions, 
photoionization cross sections and recombination rates. The inclusion of the latter parameter 
was phased out, however, due to differences in the IP on how to compute recombination rates. 
Up to the present, TIPbase contains only IP data which perhaps limits its usability as it mainly 
concentrates on the iron isonuclear sequence. 

^^ ,http : / / cdsweb . u-strasbg . f r/t ipbase/home . html 
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CHIANTI 



CHIANxif^ is an atomic data application for spectroscopic diagnostics of astrophysical plas- 
mas [H], maintained since the end of the 90s by an international collaboration that includes 
several IP members. As described in its six release^, it may be regarded as a computer tool 
— developed with a friendly user-interface based on the Interactive Data Language (IDL) — 
to calculate synthetic, optically thin, emission-line spectra of astrophysical sources and to de- 
rive plasma diagnostics. It makes use of an extensive and periodically assessed and upgraded 
atomic database which caters mainly for the solar community and, more recently, for X-ray 
work. For a large number of ions, its database contains energy levels, wavelengths, radiative 
transition probabilities, electron impact excitation rates (many from the IP) and the contin- 
uum. More recently, CHIANTI has been overhauled with inner-shell transitions, satellite lines, 
proton excitation, two-photon, relativistic free-free and free-bound continua, ionization and 
recombination rates and Fe K lines. The impressive CHIANTI citation lisil^ certainly gives an 
idea of the extent of its current user base. 



Current projects 

Although the OP and IP are still on-going enterprises, most of their members are involved in 
other atomic data computational projects. Some (e.g. the RmaX Network) may be regarded 
as spin-offs from the OP/IP, but others, although completely independent, carry the Seaton 
spirit and the expertise acquired in the past two decades of team work. We briefly mention 
here a few of the more active collaborations. 



The UK APAP/RmaX Network 

The Atomic Processes for Astrophysical Plasmas Network of the United Kingdom (APAF@) is 
a continuation of the original UK_RmaX Network which was a spin-off of the IP. It is dedicated 
to the calculation of radiative and coUisional data for the modeling of astrophysical plasmas 
and spectral analysis. Data sets containing both fundamental and derived atomic data are 
made available, and much work has been done in the development and upkeep of the i?-matrix 
codes and atomic structure codes, particularly in relation to improvements in access, user 
interfaces (NAMELIST based) and parallelization. Recent publication^ concentrate on topics 
such as atomic-data benchmarking, inner-shell electron impact excitation, radiative and Auger 
damping, fluorescence yields, electron-ion recombination, dielectronic recombination and EUV 
line identifications and diagnostics. 



^ "http : //www . chianti . rl . ac . uk/ 

^^http : //www . chianti . rl . ac . uk/publications/pub . html 



^"http : //www . chianti . rl . ac . uk/publi cat ions/ citationJList . html 
'^''http : / / amdpp . phys . strath, ac .uk/UK_RmaX/ 

31 



http: // amdpp . phys . strath . ac . uk/UK_RmaX/APAP_pub . html 
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Work at OSU 



Recent work at the Ohio State University (OSU), one of the most active members of the IP, has 
concentrated on contributions to the RmaX Network and in the computation of electron-ion 
recombination rate coefficients and photoionization cross sections for astrophysically abundant 
elements (e.g. Fe XVI, Fe XVII and Fe XXI). An interesting new approach has been the 
relativistic coupled-cluster method used |136j to compute the transition probability of Ka lines 
in F-like ions {Z = 10—79), finding a Kai and Ka2 cross over at Z = 41—42. In the extensive 
work on recombination (see Ref. [99] and references therein), high-precision photoionization 
and recombination cross sections and rates are computed self-consistently with BPRM using 
the same wave-function expansion, thus treating radiative and dielectronic recombinations in a 
unified manner [98]. It is then shown jlOOj that resonances in the unified recombination cross 
sections correspond directly to dielectronic satellite spectra. Moreover, in the context of the 
dielectronic satellite lines of He-like Fe XXV and Ni XXVII, this method has been recently 
compared [97] with the isolated resonance approximation commonly used in plasma modeling; 
good agreement (20%) is obtained for the rates of strong lines, but larger discrepancies are 
found for the weaker features. Large data sets containing A-values and gf -values for many ions 
are being computed using BPRM for fine-structure El transitions and SUPERSTRUCTURE 
for the forbidden type. All radiative data from OSU are available online from the NORAD- 
Atomic-Datal^^ web site. 

xstarDB 

XSTAR [15] is a widely used spectrum modeling package that computes the physical conditions 
and emission spectra, particularly in the X-ray region, of a photoionized gas. It can be run 
either as a single model or as a grid of models, the latter to estimate model sensitivity to 
input-parameter ranges. For the past eight years, a considerable effort has been focused on 
improving its atomic database (xstarDB) with accurate inner-shell radiative and collisional 
data (K-vacancy level energies, wavelengths, A-values, Auger widths, photoabsorption cross 
sections and electron impact excitation cross sections) which are required in the analysis of 
K lines, particularly from nitrogen [SD], oxygen [^I], second-row elements (Ne, Mg, Si, S, Ar 
and Ca) [IDSl 112], iron [ID31 [ISl M [103 EHl HI] and nickel [M]. This is an interesting 
project in as much as both data producers and users are working closely together to generate 
a tailor-made atomic database. 

Among the most relevant findings in this work |103] has been the smearing of the K edge 
caused by both radiation and Auger dampings, exemplified with the K photoabsorption of the 
Fe XVII ground state which is dominated by the double series of ls2s^2p^?2p ^.ipo resonances 
(see Figure [T7|) . These resonances decay via the following manifold: 

^^ [http : //www ■ astronomy . ohio-state . edu/ ^nahar/naharjradiativeatomicdata/ index . html 
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Figure 17: Total photoabsorption cross section of the ground state of Ne-like Fe XVII in the 
region near the K edge (7.72 keV). (a) Damping has been neglected, (b) Radiation damping is 
included, (c) Radiation and spectator Auger dampings are included. Reproduced from Figure 1 
of Ref. [103] ( |http://dx.doi.org/10.1086/344243| ). 



ls2s^2pV Is22s22p6 + (19) 

^ ls^2s^2p V + /iz/„ (20) 



KLn J Is22s22p5 + e- 
Is22s2p6 + e" 



(21) 



KLL 



Is22s22p^np + e" 

Is22s2p5np + e" (22) 



Is 2p np + e 

which is distinctively dominated by the radiative Ka (Eq. [20l) and Auger KLL (Eq. [22!) channels 
where the active np electron remains a spectator; therefore, their decay transition probabilities 
are independent of n. Consequently, as seen in Figure [171 if the Ka and Auger KLL decay 
channels are not taken into account, i.e. damping is neglected, the resonances are narrow and 
asymmetric and converge towards a sharp K edge. On the other hand, if damping is included, 
the resonance series display constant widths and symmetric profiles that become progressively 
smeared with increasing n to produce a smooth transition through the K threshold. Such edge 
smearing may be of relevance to the astrophysical interpretation |78] of absorption features in 
astrophysical X-ray spectra. 



ADAS-EU 

The Atomic Data and Analysis Structure for Fusion in Europe (ADAS-EL@) is a project 
that enables plasma diagnostics and modeling for fusion laboratories in Europe and, in the 
near future, for ITERo- Databases containing both fundamental and derived atomic data 
are implemented and maintained, also promoting the computation and measurement of new 
data. It has grown from the active ADAS consortium!^ which developed a set of computer 
codes, subroutine libraries and atomic data sets for plasma modeling and the analysis and 
interpretation of spectral emission. New topics of interest are: heavy element spectroscopy 
and models; charge exchange spectroscopy; beam stopping and beam emission spectroscopy; 
diatomic spectra and coUisional-radiative models. 
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E-science and the data deluge 



With increasing frequency and effectiveness, researchers are exchanging data, ideas, pubhca- 
tions, references and images through a variety of electronic means. However, a number of 
innovative collaborative environments are emerging which have not been readily adopted by 
the science community as its everyday workplace [311 HSl HU |50]. This new ubiquitous mode 
of collaborations is being mainly supported by e-mail and instant messaging. Beyond this 
exchange among peers, the post- Gutenberg era has dawned where information producers, i.e. 
researchers, research centers and academic institutions, now have the ability to publish and 
disseminate their intellectual production without intermediaries. 

Terms such as "cyber-infrastructure" , "e-science" and, more recently, "e-research" have 
been coined to describe this knowledge revolution. Among its most distinctive features we can 
cite: the intensive use of information and communication technologies (ICT); geographically 
distributed resources for information processing and analysis; and above all, its ubiquitous- 
ness (see Refs. (SHI EOl E] and references therein). Its main challenge is to manage, analyze 
and preserve the data deluge caused by an impressive variety of sensors, ambitious numeri- 
cal simulations and large-scale facilities such as particle accelerators, tokamaks, synchrotron 
radiation sources and ground and satellite-borne astronomical telescopes. Under this digital 
avalanche, which surpasses any traditional data management capacity, ICT can transform these 
instruments into powerful computer environments for data mining. 

As we have illustrated here with the OP/IP, large data productions are usually carried 
out by global collaborations, i.e. multinational science groups that generate large volumes of 
data which are geographically distributed and maintained only during project life cycle. Most of 
these data are never published and, when the collaborations end, many are lost or stashed away 
in national (or international) reservoirs that have nothing to do with their origins. Production 
decisions, approximations and provenance are buried in a huge electronic correspondence to 
which no-one has access [63]. 

A similar path is followed by small data producers scattered around the globe; thus both 
large and small data producers face the same problems in knowledge cataloging, preservation 
and dissemination. It is imperative to plan and build repositories that store data as they emerge 
and to retain the history of the decisions and criteria that generate them [221 ESI EOl • In 
spite of pioneering efforts more than a decade ago to create a framework of recommendations to 
guide scientific database preservation and dissemination |52], it is only recently that multilateral 
organizations and planners in Europe and the United States have started to generate technical 
reports to encourage the preservation of important scientific data collections |H HIl |82l [861 HlOj 
1128^ 1135] . However, many of these recommendations have not permeated to the producing 
communities and/or to the collection custodians in these countries. The situation is even worse 
in Latin America where we are still not convinced by, or at least aware of, the new paradigms 
in the production and dissemination of scientific knowledge, and consequently, only a low-level 
use of ICT has been incorporated in teaching and research. 

This increasing awareness to preserve and share data for use and reuse in knowledge creation 
has given the Open Access Movement a new facet. The self-preservation and dissemination of 
scientific data are beginning to be considered and discussed in different scenarios O HU |86l 
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\IW\ \ll'S\ \l'Sb\ 1143] . Inspired by the reflections and conceptual bases of the debutante, the Open 
Data Movement, it is essential to establish standards and protocols such that research data can 
be processed without barriers or high costs. The academic community perceives published data 
as their heritage; however, many publishers are insisting on copyright which, without doubt, is 
one of the greatest obstacles in e- research [55] . 

Data & metadata in e-science 

The reasons for preserving data derive from the fact that observations, knowledge and un- 
derstanding are cumulative. A datum can be considered a piece of information that can be 
processed, interpreted, transmitted and preserved. Data arise either from measurements (ob- 
servational or experimental data), simulated results (synthetic data computed with mathemat- 
ical models) or historical records (historical data). Data may be considered raw if they are 
generated directly from measurements and models, or derived if they undergo further filtering 
and processing. But, additionally, there are many subtleties and complexities: what might be 
considered derived data for some processes are raw for others, and historical data may emerge 
from the mathematical models that produce the record. 

E-research requires the automatic handling of large volumes of data which must be stream- 
lined by standards specified by both the data producers and users. The basic information 
employed to describe data — its content, features, dates, terms of use, source, ownership and 
other characteristics — is referred to as metadata. The latter allows the user to evaluate 
whether a particular data set is suitable for their purposes and to facilitate information access. 
Adequate documentation regarding sampling, analytical procedures, anomalies, accuracy and 
structure is of vital importance in its future correct interpretation. Metadata can facilitate [93] : 

• data identification and acquisition for a given subject, for a specific period of time or 
geographic location; 

• automatic analysis and data modeling; 

• the inclusion of semantic knowledge elements associated with the data. 

Metadata standardization is important inasmuch as it allows the definition of common 
terminologies specifying entry, validation, access, integration and synthesis in automation, and 
ensures complete and accurate documentation of data-set content. There are different metadata 
standards available, namely Dublin Core, Darwin Core, Content Standard for Digital Geospatial 
Metadata, ISO 19115 Geographic information metadata. Ecological Metadata Language, etc. 
The reason for so many standards is the diversity of application fields: information science, 
biology, geology, ecology and cartography, to name but a few. 

Structures (classes) are composed of metadata items associated with descriptive semantic 
definitions for some of the possible data attributes. Such structures can be arbitrarily simple 
or very complex, and the information they contain can be heterogenous depending on the data 
types and needs in the associated communities. Within its metadata model, each community 
can define a property or an item differently; for instance, the Dublin Core initiative specifies a 
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base set of 15 elements while the metadata model of the Learning Objects Metadata, which is 
being developed by the IEEE and other organizations to describe resources in teaching-learning 
environments, has about 100 items |68l 1139] . 

The incorporation of metadata demands an investment of time and effort by those who 
generate, preserve and share the data. It is advisable to make allocation for metadata-model 
definition and for the implementation learning curve, followed by maintenance costs in the 
short, medium and long term. For the implementation of a metadata system to be successful, 
there must be institutional commitment, i.e. the acceptance of technical field staff, researchers, 
students and computer and laboratory technicians. 

All this effort takes place naturally in big-science experiments; however, it is also neces- 
sary for research consortia and groups and individual investigators to become aware of the 
importance of cataloging and preserving data. Only by doing so will their relevance to future 
generations be ensured [32] . 

Scientific Communities and DDCEs 

A network of digital data curation environments (DDCEs) is beginning to be established by 
communities pursuing scientific knowledge for the consolidation and online analysis of the 
digital avalanches produced by their instruments. Digital curation involves the structuring and 
maintenance of such data for the use of current and future generations (see Refs. [ISl 112] 
and references therein). A DDCE is then a set of services and tools committed to capturing, 
preserving, curating and disseminating data. From the information point of view, a DDCE tends 
to be cumulative, permanently open — both in content and in the platform that supports it 
[Mj [82| |83] — and plays a key role in preserving and replicating the community's memory. In 
fact, a DDCR is more than simply a storage of data objects and a toolbox for data mining 
and analysis, as researchers will need to interact with data within the environment of a social 
network and with the Web 2.0 approach. This new and challenging reality therefore introduces 
significantly greater complexity to data management |84] . 

Some progress has been made in developing simulations and data capture tools, but not 
much on those for data analysis. It is common to build new instruments with little planning 
or budgeting provision for data management [52]. Much less can be said about social software 
with collaboration capabilities which is the emerging paradigm for scientific activities. What 
we really need are workflow environments to set up, with Web 2.0 functionalities, a pipeline 
from the instrument or simulation directly to the DDCE (see articles in Ref. [69] and references 
therein) . 

A DDCE must be compliant with the Open Archive Initiative (OAI); that is, the data 
repositories should be available on the Internet, allowing any user to read, download, distribute, 
print, search and link the data files, and to use them for any other lawful purpose without 
financial, legal or technical barriers other than those associated with Internet access itself 
[951 US]. 

In the next sections we describe some of the efforts that have been made by different 
communities to curate, disseminate and preserve their digital patrimony. 
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CombeChem & e-Bank: a proof of concept 



One of the first experiences in building a DDCE is taking place at the UK National Crystal- 
lography Service (NCS). This service has developed, in collaboration with the CombeChenj^ 
e-science testbed and the eBank-UK project, an e-infrastructure to facilitate end-to-end a crys- 
tallographic experiment. A proof of concept that integrates existing structure and proprietary 
data sources has been developed within a grid-based information- and knowledge-sharing envi- 
ronment [121 Hal EH- 

Following the OAI concept, these projects have constructed a data repository, eCrystalj^. 
that makes the raw and derived data from a crystallographic experiment available. The data 
are uploaded into a repository where they are complemented with additional metadata (chemi- 
cal and bibliographic). This approach allows the rapid release of crystal structure data into the 
public domain, but also provides mechanisms for value added services that foster data iden- 
tification for further studies even though the ownership of the data is always retained by the 
producer. Publication of all the results generated during the course of the experiment is then 
enabled by means of an Open Access Data Repository. 

The end product of this project is a growing number of participating partners managing 
data repositories. There are practical implications that refiect the changing nature of research 
practices towards the data-intensive paradigm, with a variety of workfiows in smart laboratories 
where scientists require more tools and services for virtual experiments. The emerging eCrystals 
Federation is thus gathering useful experience in creating a network of institutional ePrint 
repositories [57] . 

Astronomy & Astrophysics: a leading e-data community 

Astronomical data are growing at an exponential rate from the continuous construction of new 
telescopes with ever more sensitive detectors and from ambitious computational models. While 
instruments produce steady data streams, there is an increasingly complex worldwide network of 
facilities with large data outfiows. The natural characteristics of the astronomical community 
have made it an early builder and adopter of DDCEs, the most relevant being: a unified 
taxonomy, vocabulary and coded definition of metrics and units; peer reviewed data carefully 
collected with rigorous statistical standards; trackable data provenance; publicly available data; 
and ancient data preservation as they are essential for time- varying phenomena [M] . 

For years this community has been building a knowledge platform that has become a rev- 
olution in the way astronomers use data, providing a role model to other disciplines on how 
technology can be used to improve the quality and effectiveness of scientific enquiry [102] . 
These unique data services involve: three important data centers — the SAO/NASA Astro- 
physics Data System the Centre de Donnees astronomiques de Strasbourg (cdQ 
and the NASA/IPAC Extragalactic Database (NEE@) — ; the International Virtual Observa- 

^^http : //www . combechem . org/] 
■^'http : / / ecrystals . chem . soton. ac .uk/| 
■^^ ht tp : // www . ads abs . harvard . edu/ 
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tory Alliance (IVOAcj) [51]; the public data releases from individual astronomical projects; the 
preprint dissemination from the arXiv / astro-ph server; and the rapid dissemination of results 
made possible by forward-thinking journals. Most of the data repositories are open-access com- 
pliant and enthusiastically promote the Open Data Movement. In addition to the data services, 
the IVOA offers the community a toolkit of online software for data analysis. Furthermore, 
the astronomical community is sensitive to the "digital divide", thus allowing open access to 
astronomers in developing countries. 

This leading community is now facing new kinds of difficulties to be considered by the 
other emerging knowledge groups, despite information on the variety of resources available. 
International pressure against the open access of astronomical data is appearing, and it seems 
there is not enough awareness to counteract it. Difficulties remain between journals and data 
centers, and most of the published data are never posted by the latter. New instruments are still 
being built with little planning or budgeting for data management, and there is little provision 
for digital preservation. |10H 1102] . 



ITER: an emerging community 

The International Thermonuclear Experimental Reactor (ITER@) is an international large-scale 
tokamak experiment aiming to demonstrate the possibility of producing commercial energy from 
magnetic confinement fusion, expecting to do so about a decade from now. One of the greatest 
challenges for the ITER systems will be the capability for analyzing, visualizing and assimilating 
the data to support decision making, and therefore, the scientific productivity of ITER will be 
inextricably linked to the power of its collaborative infrastructure. The need to support remote 
operation of the experimental facilities will require collaborative efforts involving over 2,000 
scientists worldwide. The fusion community is moving towards the vision of remote hardware 
control of the experiment [651 1117[ 1118] , and is aware of the importance of data curation before 
the launch of the experiment; e.g. the development of a set of software tools (MDSplu^ 
for Model Data System) for data acquisition and storage. In fact, it is more than that as it 
pretends to be a methodology for the management of complex scientific data. Furthermore, it is 
worth mentioning the effort for implementing an atomic database infrastructure throughout the 
project, namely the "Atomic Data and Analysis Structure for Fusion in Europe" (ADAS-EIJ0)- 
This project aims to improve the effectiveness of data analysis in existing fusion experiments 
and prepare for ITER. 



VAMDC 

The Virtual Atomic and Molecular Data Centre (VAMDC@) is a multinational project launched 
in July 2009 involving 24 research teams from several countries of the European Union (Aus- 



■^^http : //www. ivoa. net/I 
■^^http : //www. iter . org 

http : // www . mdsplus . org 
"^^http : //www . adas-f usion. eu/ 



" ht tp : // www . vamdc . org^ 
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tria, France, Germany, Italy, Sweden and the United Kingdom), Russia, Serbia and Venezuela. 
The Venezuelan node involves CeCalCULA and the IVIC Computational Physics Laboratory. 
Atomic and molecular data (A&M) are required in a wide variety of scientific fields, e.g. 
astrophysics, fusion plasmas, atmospheric physics and chemistry, environmental studies and 
quantum optics, and technological applications such as lighting, semiconductors, nanotechnol- 
ogy and molecular biology. The VAMDC intends to upgrade and integrate at least 21 A&M 
databases — TOPbase, TIPbase, OPserver, xstarDB and CHIANTI among them — in order 
to implement an interoperable cyber-infrastructure for the exchange of atomic and molecular 
data. Its principal investigator is Marie-Lise Dubernet (Universite Pierre et Marie Curie and 
Observatoire de Paris). 

The partners, roadmap and basic guidelines of the VAMDC have been previously described 
j91] . but the following dimensions can be emphasized. 

• Network activities. They are aimed at coordinating infrastructure activities between 
interdisciplinary fields and other relevant projects such as Astrogricj^. ITER0 and Eu- 
roplanetcj. They are to promote the VAMDC services and installations involving data 
users from other disciplines such as astrophysics, atmospherics and fusion. 

• Services. The VAMDC is establishing a cyber-infrastructure for A&M data producers 
and users, giving access to the main databases in an interoperable format, maintaining 
registries, dictionaries and demand nodes, and implementing distributed grid environ- 
ments for applications and data reservoirs. 

• Joint research. Interdisciplinary teams are developing computational tools to sustain 
the e-science platform of the VAMDC, defining standards, protocols and specifications 
for the storage, exchange, publication and mining of A&M data. 

The VAMDC is addressing some of the chronic problems in A&M data activities [89j. For 
instance, due to the lack of funds, standards and common guidelines, outstanding problems 
in existing A&M database are interoperability and data interfaces which hamper productive 
searches and data mining. Data exchange has also been carried out in a somewhat informal 
manner (emails, undocumented ASCII files, peer-to-peer arrangements) even though standard 
formats from specific client disciplines (e.g. the FITS0 astronomical data format) have been 
incorporated. In this respect, one of the initial VAMDC policies has been the release of an 
XML schema, referred to as XSAM^^. for the exchange of A&M data which is at present under 
evaluation. Moreover, A&M data are stored in a variety of relational DBMSs using very diverse 
data models. Many of these DBMSs are not SQL standard packages but local developments 
that, in the long run, can compromise data integrity and regular updating procedures. In some 
cases data are not even housed in databases but as plain files in data centers, servers belonging 
to scientific journals or in departmental, project and personal web pages. Therefore, XSAMS 

"^^http : //www . astrogrid . org/ 

"^'http : / /www ■ iter ■ org/| 

ht tp : // www . eur oplanet -eu . org/ 

http : / /heasarc . gsf c . nasa . gov/docs/heasarc/ f it s_overview . html 
^' ^http : //www-amdis . iaea . org/ xsams/about . html 
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Figure 18: VAMDC virtual data-warehouse distributed structure. 



becomes the key not only for data exchange but also for data identification and provenance 
when implementing a new generation of search engines. 

In Figure [18] we show the structure of the VAMDC which may be visualized as a virtual data- 
warehouse addressing a collection of heterogenous, distributed web services. To facilitate end- 
user data mining, the warehouse contains registries, dictionaries, catalogs, workflows, metadata, 
a middleware and a uniform XML-based format for data transfers (XSAMS). 

OP/IP perspectives within VAMDC 

The current VAMDC project presents attractive possibilities for the OP/IP data activities, 
particularly with respect to data exchange and preservation and database upgrading and inte- 
gration. One of the first tasks is to transpose the TIPTOPbase DBMSs, originally developed in 
fortran, to an open-source SQL package, namely MySQL. This work is currently in progress and 
will facilitate database maintenance and integration within the virtual federation, apart from 
making data manipulation more robust and bug-free and globalizing its query possibilities. 

Data preservation is a relevant issue as most OP/IP members are retiring or close to retire- 
ment. Thus, the atomic data sets must be kept well-documented with metadata to ensure not 
only their future usage but also their identity for the new generation of search engines. The 
VAMDC is defining and establishing standard metadata and XML schemata which need to be 
adopted by the OP/IP. Furthermore, with regards to data activities, we have had very good 
experience in the past working with data centers (i.e. CDS) and supercomputer facilities (i.e. 
OSC, CeCalCULA), and perhaps these alliances should be cultivated even further in upcoming 
developments. 
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Within the VAMDC time scale, it is timely to complete and upgrade TIPTOPbase with 
new data. For instance, the original OP project neglected some chemical elements such as P, 
CI, K and astrophysically deficient members of the iron group which could now be calculated 
and included in TOPbase. Also, the IP should perhaps reconsider treating the electron impact 
excitation of Ni in a similar systematic manner as Fe. With respect to radiative data and in 
order to generate accurate massive data sets, the atomic structure code AUTOSTRUCTURE 
can now be used to compute bound-bound relativistic A-values for transitions involving em- 
pirically corrected levels with high principal quantum numbers. Furthermore, with the recent 
inclusion of the Breit operators in BPRM, a revision of the opacities in intermediate coupling 
could perhaps be envisaged. 

There is a final aspect that we must refer to, namely that of data-intensive application 
deployment. As atomic data sets become more voluminous, effective data mining will have to 
depend on innovative methods and computer tools. It will become database- centric where the 
application is now run close to the data residence rather than downloading the data sets to 
the place where the application is being run. At present atomic data-intensive applications, 
e.g. astrophysical spectra modeling codes (XSTAR0, CLOUD^Eland TLUStM3), are usually 
downloaded from a web site, installed locally and run on sequential processors. This scheme 
is limited by database volume and maintenance, and is difficult to adapt and tune to the new 
distributed (e.g. the grid) and virtual (e.g. the cloud) computing environments. Thus, new 
approaches for application deployment must be devised such as workfiows (e.g. Taverna0), grid 
portals (e.g. GENIudH) virtual machines (e.g. VMwar^f^. Within the VAMDC project 
and with NASA support, we are using XSTAR clS 8b CclSG study of application deployment in 
the new cyber-infrastructure. 

Concluding remarks 

In our review of the data activities of the OP and IP, there are a few points we would like to 
briefiy develop further. Among them would be the remarkable duration of these international 
collaborations which have spanned more that 25 years of active data production. Rather than 
praising their stability and cohesion, we would just like to emphasize that the computing 
of accurate data basically takes time and experience. Moreover, one would expect the data 
produced by these large-scale efforts to be definitive, but as discussed here, there is always 
room for data refinement, for upgrading the unavoidable approximations, for new emerging 
discrepancies and for periodic revisions. These are some of reasons behind their on-going 
status. With respect to addenda or revisions, we have suggested some firsthand cases, but 
the present solar abundance problem is indeed an incentive for re-examining the radiative 
properties, particularly those involving the inner shells, of some of the more complex ionic 

http : //heasarc . gsf c . nasa . gov/ docs/ software/ xstar/ xstar . html 
^' http : //www . nublado . org/| 
^"http : //nova . astro . umd . edu/ 
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species. In any case, as shown by the OP, IP and OPAL, relevant scientific problems can 
actually be solved by curating and analyzing large volumes of data, and with the breathtaking 
evolution of ICT, these approaches are likely to dominate in the near future. 

As we have illustrated here with the OP/IP, large data productions are usually carried 
out by global collaborations, i.e. multinational science groups that generate large volumes of 
data which are geographically distributed and maintained only during project life cycle. It 
is imperative to plan data preservation policies and protocols such that these efforts are not 
wasted in the long run. The increasing awareness to preserve and share data for use and reuse 
in knowledge creation has to support and promote the emerging Open Data Movement and 
its actions. As mentioned previously, we are confident that the current initiatives in the A&M 
community, e.g. the VAMDC, will seriously look into this problem. An initial step in the right 
direction has been the prompt release of an XML schema (XSAMS) for A&M data exchange. 
What is required now is to improve and normalize metadata identification, and since A&M data 
are used in different fields, key aspects are simplicity, fiexibility and semantic inter-operability. 

It is also interesting to point out the existing dichotomy of supercomputer centers and 
data centers. In our data activities in the OP/IP, we have worked successfully with both 
entities where their support, technical advice and security consulting were always invaluable, 
but they are in themselves very different in nature and in the services they provide. We are of 
the opinion that the essentials of the new cyber-infrastructure will promote the emergence of 
virtual distributed environments where high-performance computing will take place close to the 
data repositories. Thus, both supercomputer and data centers will have to evolve and perhaps 
merge, and the deployment of data-intensive applications will engage innovative schemes that 
could be somewhat disconcerting to both data producers and users. 

In some of the data activities of the OP/IP we were lucky to collaborate with a first-class 
computer scientist who supported our database developments. Due to the complexities of 
the new cyber-infrastructure, which are beyond most computational physicists, the reliance 
on skilled computer scientists and interdisciplinary teams is likely to become vital in data 
dissemination. This would require a scale of funding well above the ones that were tapped 
during the course of the OP and IP, but it appears that funding agencies are now more aware 
of the importance of data projects. 

Finally, we consider that the current trend for data dissemination and preservation points 
to the extensive use of DDCEs in conjunction with data factories, instruments and sensors. As 
previously mentioned, a DDCE is a set of services for capturing, curating, disseminating and 
preserving data from research projects conducted by knowledge communities such as those we 
are currently developing within the VAMDC project. 
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